NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pi-talk: Edge-Only, Adapter-Tuned Multimodal Small Language Model for Safe, Real-Time In-Vehicle Dialogue

Pissinou_Makki, Alex; Tarokh, Vahid Tarokh; Enamorado, Louis Lago; Carroz, Carlos Carroz; Garcia, Miguel; Pissinou, Niki (December 2025, IEEE)

Natural-language interaction between passengers and autonomous vehicles is essential for trust, safety, and user experience, but deploying Large Language Models (LLMs) on automotive edge platforms is constrained by compute, memory, energy, and privacy. We present Pi-talk, an edge-only system that enables real-time passenger–vehicle dialogue using a Small Language Model (SLM) running entirely on embedded hardware. Pi-talk performs multimodal fusion of onboard camera, ultrasonic distance, and navigation context via a lightweight encoder–adapter module that aligns modalities into compact semantic tokens for a pre-trained SLM. The SLM produces context-aware explanations of driving decisions, route options, and situational updates without cloud connectivity. Safety is enforced through a real-time safety envelope that gates responses and actions using distance thresholds and timing constraints. We further adapter-tune the SLM (on-device or offline) and deploy it with INT8 quantization and an Open Neural Network Exchange (ONNX) runtime to achieve efficient batch = 1 inference on Raspberry-Pi–class hardware. We evaluate task quality (evaluation loss), end-to-end latency, CPU utilization, and memory footprint, and include ablations contrasting unimodal vs. fused inputs. Results show that Pi-talk sustains few-second, edge-only inference while meeting stringent resource and latency limits and maintaining the safety envelope required for autonomous operation. To our knowledge, Pi-talk is among the first edgeonly, multimodal passenger–vehicle dialogue systems that both fine-tune and run a small language model entirely on Raspberry Pi–class, CPU-only hardware with an explicit while enforcing a runtime safety envelope.
more » « less
Free, publicly-accessible full text available December 5, 2026
Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

Liao, Junyi; Zhu, Zihan; Fang, Ethan X; Yang, Zhuoran; Tarokh, Vahid (August 2025, ICML Proceedings)

Estimating the unknown reward functions driving agents' behavior is a central challenge in inverse games and reinforcement learning. This paper introduces a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization. Given observed player strategies and actions, we aim to reconstruct the underlying reward functions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of feasible rewards, and limited observational data coverage. To address these challenges, we establish reward function identifiability using the quantal response equilibrium (QRE) under linear assumptions. Building on this theoretical foundation, we propose an algorithm to learn reward from observed actions, designed to capture all plausible reward parameters by constructing confidence sets. Our algorithm works in both static and dynamic settings and is adaptable to incorporate other methods, such as Maximum Likelihood Estimation (MLE). We provide strong theoretical guarantees for the reliability and sample-efficiency of our algorithm. Empirical results demonstrate the framework’s effectiveness in accurately recovering reward functions across various scenarios, offering new insights into decision-making in competitive environments.
more » « less
Free, publicly-accessible full text available August 15, 2026
In-Context Reinforcement Learning From Suboptimal Historical Data

Dong, Juncheng; Guo, Moyang; Fang, Ethan X; Yang, Zhuoran; Tarokh, Vahid (August 2025, ICML Proceedings)

Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer (DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value function that estimates the advantage functions of the behavior policies that collected the suboptimal trajectories. Then we train a transformer-based policy via a weighted maximum likelihood estimation loss, where the weights are constructed based on the trained value function to steer the suboptimal policies to the optimal ones. We conduct extensive experiments to test the performance of DIT on both bandit and Markov Decision Process problems. Our results show that DIT achieves superior performance, particularly when the offline dataset contains suboptimal historical data.
more » « less
Free, publicly-accessible full text available August 15, 2026
Conditional Average Treatment Effect Estimation Under Hidden Confounders

Aloui, Ahmed; Dong, Juncheng; Hasan, Ali; Tarokh, Vahid (July 2025, The 41st Conference on Uncertainty in Artificial Intelligence)

Free, publicly-accessible full text available July 21, 2026
CATE EstimationWith Potential Outcome Imputation From Local Regression

Aloui, Ahmed; Dong, Juncheng; Le, Cat P; Tarokh, Vahid (July 2025, The 41st Conference on Uncertainty in Artificial Intelligence)

Free, publicly-accessible full text available July 21, 2026
Topology-aware robust representation balancing for estimating causal effects

Farzam, Amirhossein; Aloui, Ahmed; Tarokh, Vahid; Sapiro, Guillermo (July 2025, NeurIPS 2025 High-dimensional Learning Dynamics Workshop)

Representation learning in high-dimensional spaces faces significant robustness challenges with noisy inputs, particularly with heavy-tailed noise. Arguing that topological data analysis (TDA) offers a solution, we leverage TDA to enhance representation stability in neural networks. Our theoretical analysis establishes conditions under which incorporating topological summaries improves robustness to input noise, especially for heavy-tailed distributions. Extending these results to representation-balancing methods used in causal inference, we propose the *Topology-Aware Treatment Effect Estimation* (TATEE) framework, through which we demonstrate how topological awareness can lead to learning more robust representations. A key advantage of this approach is that it requires no ground-truth or validation data, making it suitable for observational settings common in causal inference. The method remains computationally efficient with overhead scaling linearly with data size while staying constant in input dimension. Through extensive experiments with -stable noise distributions, we validate our theoretical results, demonstrating that TATEE consistently outperforms existing methods across noise regimes. This work extends stability properties of topological summaries to representation learning via a tractable framework scalable for high-dimensional inputs, providing insights into how it can enhance robustness, with applications extending to domains facing challenges with noisy data, such as causal inference.
more » « less
Free, publicly-accessible full text available July 1, 2026
In-Context Reinforcement Learning From Suboptimal Historical Data

Dong, Juncheng; Guo, Moyang; Fang, Ethan X; Yang, Zhuoran; Tarokh, Vahid (July 2025, 2025 International Conference on Machine Learning)

Free, publicly-accessible full text available July 13, 2026
Variational Adversarial Training Towards Policies with Improved Robustness

Dong, Juncheng; Hsu, Hao-Lun; Gao, Qitong; Tarokh, Vahid; Pajic, Miroslav (May 2025, The 28th International Conference on Artificial Intelligence and Statistics)

Free, publicly-accessible full text available May 3, 2026
Off-Policy Evaluation for Human Feedback

Gao, Qitong; Gao, Ge; Dong, Juncheng; Tarokh, Vahid; Chi, Min; Pajic, Miroslav (December 2024, The Thirty-Eighth Annual Conference on Neural Information Processing Systems)

Full Text Available
Bayesian quickest change detection for unnormalized and score-based models

https://doi.org/10.1080/07474946.2024.2373118

Banerjee, Taposh; Tarokh, Vahid (July 2024, Sequential Analysis)

Score-based algorithms are proposed for the quickest detection of changes in unnormalized statistical models. These are models where the densities are known within a normalizing constant. These algorithms can also be applied to score-based models where the score, i.e., the gradient of log density, is known to the decision maker. Bayesian performance analysis is provided for these algorithms and compared with their classical counterparts. It is shown that strong performance guarantees can be provided for these score-based algorithms where the Kullback-Leibeler divergence between pre- and post-change densities is replaced by their Fisher divergence.
more » « less
Full Text Available

« Prev Next »

Search for: All records